Entropy-based analysis of the number partitioning problem 



A.R. Lima^'^ and M. Argollo de Menezes^ 
Laboratoire de Physique et Mechanique des Milieux Heterogenes, ESPCI Paris 
10 rue Vauquelin, 75231 Paris Cedex 05, France 
^ Institute de Fisica, Universidade Federal Fluminense 
Av. Litoranea 24210-340, Niteroi, RJ, Brazil 

Version: 12/07/2000 - Printed: February 1, 2008 



Abstract 

In this paper we apply the multicanonical method of statistical physics on the number-partitioning problem 
(NPP). This problem is a basic NP-hard problem from computer science, and can be formulated as a spin-glass 
problem. We compute the spectral degeneracy, which gives us information about the number of solutions for a 
given cost E and cardinality m. We also study an extension of this problem for Q partitions. We show that a 
fundamental difference on the spectral degeneracy of the generalized {Q > 2) NPP exists, which could explain why 
it is so difficult to find good solutions for this case. The information obtained with the multicanonical method can 
be very useful on the construction of new algorithms. 



■ 1 Introduction 

■ The use of statistical mechanics tools to understand the main ideas underlying problems as diverse as biological, social 
I and economic systems has become a common task, both theoretically and computationally ||. Recently, these 
. tools have been applied to computer science problems |p|-p^, not intending to solve them exactly, but to understand 

their complexity and the underlying mechanisms generating such complex behavior. In this paper we focus on the 
. number partitioning problem, a fundamental problem in theoretical computer science pT| ]. Our aim is to apply the 

multicanonical method (MUCA) ||l2| of statistical physics on this problem and study the behavior of nearly optimal 
. solutions. This information is important for the development of new algorithms which try to find optimal solutions. 

In the next section we discuss the number partitioning problem and its formulation as a spin glass problem. We 
. also introduce the multi-partitioning problem and map it onto a Q-states Potts model. In section || we present the 
' Lee formulation of the multicanonical method and apply it to our problem. On section |^ we discuss our results both 
. for the classical and for the multi-partitioning problem. 



2 The number partitioning problem 

The number partitioning problem (NPP) is, according to Garey and Johnson one of the six basic computer science 
problems. Given a set A = {ai, 02, a^, 04, oat} with TV integer numbers, the traditional NPP consists of partitioning 
the set A into two disjoint sets Ai and A2 such that the difference 
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is minimized. If there are A'^i numbers in the set Ai and N2 numbers in the set A2, then 
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is called the cardinality of the set. 

On the unbalanced NPP the only condition is to minimize the cost function (cq. |lj) without any restriction to 
the value of m. The problem of finding good solutions (whenever they exist) for the unbalanced (non-fixed m) NPP 
was essentially solved by the deterministic Karmakar-Karp-Korf complete algorithm p3|[l^. This algorithm was 
generalized by Mertens for the balanced (m fixed) case Some recent papers addressed the possibility of carrying 
a statistical analysis of this problem |5| P, |^-pX|] obtaining interesting results, such as: the existence of an easy-to- 
hard transition [p[, the non self averaging property of the ground state energy [p[, the analytical derivation of the 



lower bounds for the energy as a function of the cardinality Q and the equivalence of the NPP to a random cost 
problem |p^ . 

For that purpose, it was proposed a mapping of the NPP problem onto a spin-glass model: associate to each 
number a new variable Si (which we call "spin") such that if € Ai them st = —1, otherwise Si = +1. With this 
mapping, we can search for a configuration of spins Si, ...sn which minimizes the cost function (or energy) 
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or its square. 
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with Jij = aiaj, which we recognize as an infinite-range spin-glass Hamiltonian. We can also write down the cardinality 
on a "magnetization-like" fashion 
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Finding an optimal solution for the number partitioning problem consists of finding the spin configuration of the 
fundamental state on the spin-glass problem. This is a very difficult task, mainly because of the great number of 
metastable states separated by a hierarchy of increasingly high energy barriers p6[ | . 

A much more difficult problem is multi-partitioning. This problem consists of partitioning the set of numbers A 
into Q disjoint sets such that the energy and magnetization associated to these partition assume the values E and 
m, respectively. This problem has several applications ||l^,|l^, such as the division of N different jobs (computer 
programs) into Q computers. 

We can map this problem onto a Potts spin-glass by assigning to each number ai a spin Si which can assume integer 
values from 1 to Q. These spin magnitudes represent the set to which the number belongs. Hence, the energy can be 
written as 

Q Q 

^-EEl^^-^^l (6) 

i—1 j>i 



where = X^fcLi ^kS(sk.i) sum of the elements in the set i. In the same way we define the magnetization as 
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where = X^feLi ^{sk-i) number of elements in the set i. 

Clearly this problem is much more complex than the traditional NPP (Q = 2). 

In the next section we show that the multicanonical method can be used to determine the spectral degeneracy 
of the problem, i.e., the number of solutions g{E,ni) which have a given energy E and magnetization m. In the 
statistical mechanics sense, this completely characterizes the problem since the (adimensional) entropy is given by 
S{E, m) = hig{E, m). 



3 The multicanonical method (Entropic Sampling) 

The multicanonical method was introduced in 1991 by Berg and Neuhaus jT2j, and the basic idea of this method is 
to sample micro-configurations of a given system performing a biased random walk (RW) in the configuration space 
which leads to another unbiased random walk (i.e. with uniform distribution) along the energy axis. This walk 
must have a visiting probability of each energy level E which is inversely proportional to g{E), the quoted spectral 
degeneracy. If one can measure the transitions probabilities from an energy level E to all other energy levels, one 
is able to obtain g{E). The multicanonical method has been shown to be very efficient in obtaining satisfactory 
results for g{E) in a large variety of problems such as evolutionary problems |l9[ , phase equilibrium in binary lipid 
bilayer and optimization problems (for reviews of the method see [^2[). The Entropic Sampling Method 
(ESM) [ ^3[ , which we will use throughout this paper, has been proven to be an equivalent formulation of MUCA ||2^ . 
Here we are interested in the multiparametric formulation of the multicanonical method, since we must obtain the 
spectral degeneracy g[E^m) p5|| as a function of two parameters E and m. Let E{X) and ■m{X) be the energy and 
magnetization associated to the microstate X, the transition probability between two states Xi and Xf is given by 



(8) 



where S{E,m) ~ \ng{E,m) is the entropy, = E{Xi){Ef = E{Xf)) is the energy of the initial (final) state, 
rrii — m{Xi){mf — m{Xf)) is the magnetization of the initial (final) state and g{E,m) is the spectral degeneracy. 
The transitional probability (eq. ^ satisfies a detailed balance equation and leads to a distribution of probabilities 
where a state is sampled with probability oc 1/ g{E,m). The successive visitations along the energy axis follow a 
uniform distribution, but unfortunately g(E,m) is not known a priori. One way of obtaining g{E,m) is to construct 
it iteratively, such as in the "Entropic Sampling" method proposed by Lee ||2^ ] . This method can be summarized as 
follows: 

Step 1: Start with S{E,m) — for all states; 

Step 2: Perform a few|^ unbiased RW steps in the configuration space and store H{E,m), the number of tossed 
movements to each state with energy E and magnetization m (in this stage all movements are accepted); 
Step 3: Update S{E,m) according to 

^(i?,m) = | + ,ifi^(i^,™)^0 

^ ' b(E,m) , otherwise. ^ ^ 

Step 4- Perform a much longer MC run using the transitional probability given by eq. (|^), storing H{E,7n). 
Step 5: Repeat 3 and 4. This is considered one iteration. 

Usually, the number of Monte Carlo steps used in step 4 increases with the number of iterations. For a detailed 
description of the method see refs. |^,^,^. We applied this algorithm to the problem of bi- and multi-partitioning 
in order to obtain the entropy S{E,m), which is shown for the case of bi-partitioning in figure 0. These results were 
obtained for a single instance (disorder realisation) of 100 integer numbers chosen randomly between and 10^°. 
All results shown are typical ones. It is important to stress that, after obtaining the entropy of the system through 
extensive simulations, all thermodynamic averages for different cardinalities m can be executed with no need to any 
further computer effort. 
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Figure 1: Typical entropy curve for the case of bi-partitioning. Here the entropy in computed for a single instance of 
100 numbers chosen randomly between and 10^". The energy is normalized by the largest possible value E^ax = 10^^. 

In the next section we are going to analyze this entropy in order to recover well-known results concerning bounds 
of the (E, to) curve for nearly optimal solutions. Through analysis of the entropy of the multi-partitioning problem 
it will be clear that the change of complexity of this problem for Q > 2 is associated to fundamental changes of the 
entropy curve. 

4 Numerical results 

In 1^, Ferreira and Fontanari have calculated through the replica trick analytical estimates for the average lower 
bound of the energy E as a, function of magnetization (cardinality) to. By collapsing the z— axis on the {x,y) plane 
(here the {E, to) plane) we are able to see this limit and also the upper bound for E. In figure]^ we show our numerical 
results and the analytical prediction of Ferreira and Fontanari. The energy is normalized appropriately. 

In computer science one is only interested in the optimal solutions, that is, the information contained on the first 
"slice" of the S{E,m) surface, the S'(0,to) plane. In figure || we show S{e,m), where e is our numerical tolerance, 
which we chose to be {Em ax — -Em/7v)/1024. 

^when compared with the number of steps which will be done in the step 4 



Figure 2: Collapse of all entropies 5(e, m) on the (e, m) plane, evidencing the upper and lower bounds for e as a 
function of the cardinality m. Again, e ~ E/E^i^x- The dashed line is the analytical prediction given by Ferreira and 
Fontanari Typical results for a single instance with N = 100 numbers. 

One interesting feature we have observed numerically is that the maximum number of solutions does not occur for 
m = 0; it is easier to find solutions where the number of elements on each set is not exactly equal. This feature is not 
characteristic of a particular instance. 
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Figure 3: Entropy of an A'^ = 100 instance as a function of the cardinality for nearly optimal solutions. The maximum 
number of solutions does not correspond to an equipartition of the set, but to two subsets with N/2 — 1 and N/2 + 1 
numbers, respectively. This result is not characteristic of a single instance (disorder realisation), but seems to appear 
on most of the numerical results. 

Now we show our results for the multi-partitioning problem, where no reasonable deterministic algorithm that finds 
optimal solutions exists for any instance. As far as we know, there is no theoretical study of the NPP for Q > 2. If 
we look at the number of solutions g{E) = J^mdi^^^) i'^^ entropy S{E) = lng{E)) for different cardinalities, we 
observe a fundamental difference between Q = 2 and Q > 2 results (figure ^). For the case Q = 2, we confirm recent 
results obtained by Mertens ||l^ , who showed that the traditional number-partitioning problem (Q = 2) is essentially 
equivalent to the random cost problem, the problem of finding the minimum in an exponentially long list of random 
numbers, where no other algorithm than random search can be more efficient. As the maximum of the entropy lies 
near E = in the NPP [Q = 2) problem (fig. |l|), any algorithm based on random movements will drive the system 
close to the fundamental state. 

For Q > 2 we have a completely different scenario. The maximum of the entropy is not near E — 0, indeed, E ~ 



is a minimum and the number of solutions with E ^ decreases, at least, exponentially with Q. This means that any 
algorithm based on random movements would drive the system away from the fundamental state on the Q > 2 case. 
The effects of this behavior for the construction of new algorithms have to be taken into account. On the traditional 
differencing scheme p^-[l5| we can say that, if the number of nearly optimal solutions increases exponentially as 
£^ ^ 0, it is always possible to find better and better solutions, the computational time spent on the search being 
the only barrier. For the Q > 2 NPP this kind of procedure does not seem to work, since the number of solutions 
decreases exponentially for E 0. 
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Figure 4: The normalized entropy of the system as a function of the energy e for different number of partitions Q. The 
fundamental difference between bi- and multi-partitioning is that for the latter the maximum of the entropy deviates 
from the origin, which turns out to be a minimum, making it more difficult to find nearly optimal solutions. 



5 Conclusions 

We showed in this paper that the multicanonical method for obtaining thermodynamic averages of statistical systems 
can provide a tool for assessing the complexity of computer science problems, such as the number-partitioning problem 
(NPP). This problem is one of the six basic computer science problems, according to Garey and Johnson and 
can be formulated as a spin-glass problem. Based on this analogy we proposed a statistical mechanics method for 
computing the spectral degeneracy of the NPP problem which gives us information about the number of solutions for 
a given cost E and cardinality m. We have also studied an extension of this problem for Q partitions, where there 
exists no good deterministic algorithm which finds optimal solutions. We observed a fundamental difference between 
the classical {Q = 2) and the generalized (Q > 2) NPP, which explains why it is so difficult to find good solutions for 
the latter case. This information can be very useful in the construction of new algorithms. 
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